marck.vaisman@microsoft.commarck.vaisman@georgetown.edu
The OG Data Science Cheat Sheet
Data Science can be considered a:
Science: Data science is viewed as a continuation of empirical science, which has always been centered around data, with historical examples like Kepler’s use of data to prove Copernicus’ theory.
Research Paradigm: It represents a shift in research methodology, moving from a deductive approach to a more inductive approach due to the abundance of data and computational resources.
Research Method: Data science is used to discover new concepts, measure their prevalence, assess causal effects, and make predictions, transforming the research process.
Discipline: The field is inherently interdisciplinary, integrating knowledge from domains such as computer science, mathematics, statistics, and specific application domains.
Workflow: It involves a series of steps including data collection, exploratory data analysis, modeling, and communication of results.
Profession: The multifaceted nature of data science work includes aspects of science, engineering, mathematics, statistics, and domain expertise, making it a distinct professional field.
What would a Data Scientist do?
EDISON Data Science Framework (EDSF): This framework includes a Competency Framework for Data Science (CF-DS) and a defined body of knowledge (DS-BoK). The CF-DS is developed around five major knowledge area groups: Data Analytics, Data Science Engineering, Data Management, Research Methods and Project Management, and Domain Related Competencies and Business Analytics Competencies. These areas define the explicit skills and knowledge that exemplify competence in data science.
AIS IS 2010 Curriculum Guidelines: This curriculum is designed to educate and prepare graduates to enter the workforce by equipping them with knowledge and skills in three categories: IS-specific knowledge and skills, foundational knowledge and skills, and domain fundamentals. The “IS 2010” report is a collaborative effort between the Association for Computing Machinery (ACM) and the Association for Information Systems (AIS).
Business Higher Education Forum (BHEF) Data Science and Analytics Competency Map: This map lists Data Science concepts and principles tiered into when and where these concepts are learned.
ACM and IDASS Competencies: These documents provide a high-level list of competencies that undergraduate Data Science students should learn, with competencies directly comparable to EDISON’s CF-DS.
Park City Math Institute Curriculum Guidelines for Undergraduate Programs in Data Science: This guideline does not elaborate on how to integrate the application domains knowledge into the curriculum but recognizes the importance of domain-related knowledge for practical work of a Data Scientist.
(Schmitt et al. 2023; Weiser et al. 2022; Hazzan and Mike 2023)
The difference between a skill and a competency is often related to the scope and integration of knowledge, abilities, and behaviors. A skill is typically understood as a specific learned activity that can be performed, often something that can be developed through practice. Competencies, on the other hand, are broader and include a combination of skills, knowledge, and attributes that enable someone to perform effectively in a job or situation 1 .
Competencies are often described as more holistic, encompassing not just the ability to perform a task (skill) but also the understanding (knowledge) and the appropriate application (attributes) of that skill in various contexts. They reflect a person’s capability to apply or use a set of related knowledge, skills, and abilities required to successfully perform “critical work functions” or tasks in a defined work setting 1 .
In summary, while skills are specific to certain tasks, competencies are more comprehensive and relate to the overall ability to perform a job effectively, which includes a combination of multiple skills, the knowledge of when and how to use them, and the attitude or behavior to perform them successfully 1 .
Computational
Statistical
Mathematical
Application Domain
Data
(Weiser et al. 2022; “Toward Foundations for Data Science and Analytics: A Knowledge Framework for Professional Standards” 2020; Fayyad and Hamutcu 2022; Hazzan and Mike 2023; Adhikari and Jordan 2021; Cuadrado-Gallego and Demchenko 2023)
What are you going to do to help fix this?
Standardize Roles: Adopt a simplified and anchor categorization of job roles with clear definitions and expectations to reduce confusion and align understanding across the industry.
Utilize Frameworks: Refer to established frameworks such as the IADSS Data Science Knowledge Framework to converge on the body of knowledge specification and ensure consistency in role definitions.
Skills-Based Assessment: Implement objective, skills-based assessments to standardize the evaluation of data science professionals and ensure alignment with role requirements.
Industry-Specific Classifications: Extend the work to include industry-specific role classifications that cater to the unique needs and contexts of different sectors.
Benchmarking and Comparison: Compare and contrast standardized role definitions with actual industry practices, such as analyzing LinkedIn job posts, to ensure relevance and applicability.
Communication and Education: Communicate the process and expectations clearly to all stakeholders, including hiring managers, executives, educators, and aspiring data science professionals, to align expectations.
Continuous Evolution: Recognize that the field is dynamic, with new titles and roles emerging, and be open to further subclassification and evolution of data scientist roles.
text-embedding-ada-002GPT4marck.vaisman@microsoft.com
marck.vaisman@georgetown.edu
https://wahalulu.github.io/data-council-2024-effective-data-practitioner/